Spark 3.3: Improve task and job abort handling by aokolnychyi · Pull Request #6876 · apache/iceberg

aokolnychyi · 2023-02-17T20:17:28Z

This PR improves our task and job abort handling in Spark 3.3.

This change leverages bulk deletes whenever possible.
This change adds helpful log messages that indicate how many files were deleted and task context if any.

[Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting commit for partition 0 (task 0, attempt 0, stage 0.0)
[Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) (partition 0 (task 0, attempt 0, stage 0.0))
...
[Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) is aborting.
[Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 0 file(s) (job abort)
[Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhive.default.table, format=PARQUET) aborted.

[Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting commit for partition 0 (task 4, attempt 0, stage 3.0)
[Executor task launch worker for task 0.0 in stage 3.0 (TID 4)] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) using bulk deletes (partition 0 (task 4, attempt 0, stage 3.0))
...
[Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhivebulk.default.table, format=PARQUET) is aborting.
[Test worker] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 0 file(s) using bulk deletes (job abort)
[Test worker] ERROR org.apache.spark.sql.execution.datasources.v2.AppendDataExec - Data source write support IcebergBatchWrite(table=testhivebulk.default.table, format=PARQUET) aborted.

aokolnychyi · 2023-02-17T20:20:13Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

-      Map<String, String> props = table.properties();
-      Tasks.foreach(files(messages))
-          .executeWith(ThreadPools.getWorkerPool())
-          .retry(PropertyUtil.propertyAsInt(props, COMMIT_NUM_RETRIES, COMMIT_NUM_RETRIES_DEFAULT))


I don't think it is reasonable to use commit retry mechanism for deletes. It is the only place we did this. For now, I added some default configs in SparkCleanupUtil. I doubt we want to make it configurable.

aokolnychyi · 2023-02-17T20:22:51Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

-                      : ImmutableList.of()));
-    }
-    return ImmutableList.of();
+  private List<DataFile> files(WriterCommitMessage[] messages) {


I need a list to know the collection size.

I have a little concern about memory, now we are manifesting paths into List, instead of keeping them as Iterable (if they are originally). I see its mostly to log sizes, I wonder if we can't implement a wrapping counter iterable for that?

I'm okay either way, it seems like we were previously anyways materializing the WriterCommitMessages which have the files anyways? using s3 as an example, it takes 1 million objects with the worst case key length of 1024 bytes to use 1 GB of memory.

That's true. actually @amogh-jahagirdar was wondering if you know, is there a reason we dont have the deleteFiles() return number of deleted files? Would be probably be more convenient for callers to log the size that way?

I changed the code to keep a list of files (shouldn't cost anything extra as those files are already there) and switched to using Lists.transform(), which is a lazy transform in SparkCleanupUtil.

@szehon-ho @amogh-jahagirdar, could you take another look?

aokolnychyi · 2023-02-17T20:24:31Z

cc @karuppayya @RussellSpitzer @rdblue @szehon-ho @flyrain @amogh-jahagirdar @singhpk234

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

szehon-ho · 2023-02-18T06:54:38Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

+  }
+
+  // the format matches what Spark uses for internal logging
+  private static String taskInfo() {


Nit: what do you think to move private method to bottom? Breaks the flow of code a bit (would have liked to see deleteFiles right after deleteTaskFiles as its the main delegate)

I was trying to group methods by logic instead of access. My reasoning here was that taskInfo() is only invoked in this method is directly related to deleteTaskFiles(). Let me know if that makes sense.

Yea , its definitely subjective, I prefer personally to see the public methods and their javadocs first to get a high level idea of what the class before diving in to details (especially given there's only two public methods in this class). But as its style preference, I'll leave it optional then.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

szehon-ho · 2023-02-18T07:25:18Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

-                      : ImmutableList.of()));
-    }
-    return ImmutableList.of();
+  private List<DataFile> files(WriterCommitMessage[] messages) {


I have a little concern about memory, now we are manifesting paths into List, instead of keeping them as Iterable (if they are originally). I see its mostly to log sizes, I wonder if we can't implement a wrapping counter iterable for that?

amogh-jahagirdar

Thanks @aokolnychyi great to see this improvement

amogh-jahagirdar · 2023-02-20T04:51:54Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

+    try {
+      io.deleteFiles(paths);
+      LOG.info("Deleted {} file(s) using bulk deletes ({})", paths.size(), context);
+


Nit: unnecessary newline

We do this sometimes when either the try or catch block are non-trivial to separate them.

amogh-jahagirdar · 2023-02-20T05:20:51Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java

-                      : ImmutableList.of()));
-    }
-    return ImmutableList.of();
+  private List<DataFile> files(WriterCommitMessage[] messages) {


I'm okay either way, it seems like we were previously anyways materializing the WriterCommitMessages which have the files anyways? using s3 as an example, it takes 1 million objects with the worst case key length of 1024 bytes to use 1 GB of memory.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java

amogh-jahagirdar · 2023-02-20T05:23:29Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java

+      if (cleanupOnAbort) {
+        SparkCleanupUtil.deletePaths("job abort", table.io(), filePaths(messages));
+      } else {
+        LOG.warn("Skipping cleanup of written files, unable to determine the final commit state");


The "skipping cleanup of written files" part makes sense to me, but wouldn't "unable to determine the final commit state" apply for both cases (any abort case)? Or are we trying to indicate that we won't be cleaning up any orphan files

I adapted the original comment but I agree it is a bit weird as the var name is generic and does not say anything about commit state. I changed.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

singhpk234 · 2023-02-18T06:50:18Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

+      return "unknown task";
+    } else {
+      return String.format(
+          "partition %d (task %d, attempt %d, stage %d.%d)",


can we show stage attempID better something like :
Task (id : <TaskID>, attempt : <attemptNumber>), Stage (id : <stageId>, attemp : <attempNumber>)
in place of
(task 0, attempt 0, stage 0.0)

My idea is to follow the exact format used in Spark so that we can easily match Spark and Iceberg logs.

[Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Aborting commit for partition 0 (task 0, attempt 0, stage 0.0) [Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] INFO org.apache.iceberg.spark.source.SparkCleanupUtil - Deleted 2 file(s) (partition 0 (task 0, attempt 0, stage 0.0))

In this example, it is clear that these two records belong to the same context, even though they were produced by Spark and Iceberg. If we change the format, it won't be obvious.

...v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestWriteAborts.java

aokolnychyi · 2023-02-21T22:19:28Z

@szehon-ho @amogh-jahagirdar @singhpk234, could you take another look?

szehon-ho

Looks good to me , small comments for consideration

szehon-ho · 2023-02-23T06:28:35Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

+  }
+
+  // the format matches what Spark uses for internal logging
+  private static String taskInfo() {


Yea , its definitely subjective, I prefer personally to see the public methods and their javadocs first to get a high level idea of what the class before diving in to details (especially given there's only two public methods in this class). But as its style preference, I'll leave it optional then.

szehon-ho · 2023-02-23T06:32:39Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCleanupUtil.java

+  /**
+   * Attempts to delete as many given files as possible.
+   *
+   * @param context a helpful description of the context in which this method is invoked


Nit: Thanks for javadoc, how about 'a helpful description of the operation invoking this method' (to avoid re-using context to define itself)? Not sure its completely accurate though.

I like it, let me change.

aokolnychyi · 2023-02-23T21:55:23Z

Thanks for reviewing, @szehon-ho @singhpk234 @amogh-jahagirdar!

github-actions bot added the spark label Feb 17, 2023

aokolnychyi commented Feb 17, 2023

View reviewed changes

szehon-ho reviewed Feb 18, 2023

View reviewed changes

amogh-jahagirdar reviewed Feb 20, 2023

View reviewed changes

singhpk234 reviewed Feb 21, 2023

View reviewed changes

aokolnychyi force-pushed the improve-abort branch 2 times, most recently from f669b92 to b73396c Compare February 21, 2023 22:13

szehon-ho approved these changes Feb 23, 2023

View reviewed changes

Spark 3.3: Improve task and job abort handling

5678a5a

aokolnychyi force-pushed the improve-abort branch from b73396c to 5678a5a Compare February 23, 2023 17:58

aokolnychyi merged commit 3efaee1 into apache:master Feb 23, 2023

aokolnychyi mentioned this pull request Feb 24, 2023

Spark 3.2: Improve task and job abort handling #6926

Merged

krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023

Spark 3.3: Improve task and job abort handling (apache#6876)

54adf57

Conversation

aokolnychyi commented Feb 17, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Feb 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Feb 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Feb 17, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

szehon-ho Feb 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aokolnychyi commented Feb 21, 2023

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Feb 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szehon-ho Feb 18, 2023 •

edited

Loading

szehon-ho Feb 20, 2023 •

edited

Loading

szehon-ho Feb 18, 2023 •

edited

Loading